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Y^ • Abstract 

Whether it is for audit or for recovery purposes, data checkpointing is an important prob- 
lem of distributed database systems. Actually, transactions establish dependence relations on 
^ ' data checkpoints taken by data object managers. So, given an arbitrary set of data checkpoints 

0^ . (including at least a single data checkpoint from a data manager, and at most a data check- 

point from each data manager), an important question is the following one: "Can these data 
checkpoints be members of a same consistent global checkpoint?" . 

This paper answers this question by providing a necessary and sufficient condition suited for 

Q> . database systems. Moreover, to show the usefulness of this condition, two non-intrusive data 

0^ ' checkpointing protocols are derived from this condition. It is also interesting to note that this pa- 

C/5 . per, by exhibiting "correspondences", establishes a bridge between the data object /transaction 

^ ' model and the process/message-passing model. 

>'■ 

rS ' 1 Introduction 

H 



Checkpointing the state of a database is important for audit or recovery purposes. When compared 
to its counterpart in distributed systems, the database checkpointing problem has additionally to 
take into account the serialization order of the transactions that manipulates the data objects 
forming the database. Actually, transactions create dependencies among data objects which makes 
harder the problem of defining consistent global checkpoints in database systems. 

Of course, it is always possible, in a database environment, to design a special transaction, 
that reads all data objects and saves their current values. The underlying concurrency control 
mechanism ensures this transaction gets a consistent state of the data objects. But this strategy is 



inefficient, intrusive (from the point of view of the concurrency control |13|) and not practical since 



a read only transaction may take a very long time to execute and may cause intolerable delays for 



other transactions [^. Moreover, as pointed out by Salem and Garcia-Molina |^], this strategy 

*This work was done during a stay of Michel Raynal at the Dipartimento di Informatica e Sistemistica of Rome 
supported by a grant of the Consiglio Nazionale delle Ricerche in the context of the Short-Term-Mobihty program. 



may drastically increase the cost of rerunning aborted transactions. So, it is preferable to base 
global checkpointing (1) on local checkpoints of data objects taken by their managers, and (2) on 
a mechanism ensuring mutual consistency of local checkpoints (this will ensure that it will always 
be possible to get consistent global checkpoints by piecing together local checkpoints). 

In this paper we are interested in exploiting such an approach. We consider a database in 
which each data object can be individually checkpointed (note that a data object could include, 
practically, a set of physical data items). If these checkpoints are taken in an independent way, 
there is a risk that no consistent global checkpoint can ever be formed (this leads to the well known 
domino effect [0]). So, some kind of coordination is necessary when local checkpoints are taken 
in order they be mutually consistent. In this paper we are interested in characterizing mutual 
consistency of local checkpoints. More precisely, we are interested in the two following points. 

• First, we address the following question: "Given an arbitrary set S of checkpoints, can this 
set be extended to get a global checkpoint (i.e., a set including exactly one checkpoint from 
each data object) that is consistent?". The answer to this question is well known when the 
set S includes exactly one checkpoint per data object [jlO|. It becomes far from being trivial, 
when the set S is incomplete, i.e., when it includes checkpoints from only a subset of data 
objects. When S includes a single data checkpoint, the previous question is equivalent to 
"Can this local checkpoint belong to a consistent global checkpoint?". 

• Then, we focus on data checkpointing protocols. Let us consider the property "Local check- 
point C belongs to a consistent global checkpoint". We design two non-intrusive protocols. 
The first one ensures the previous property when C is any local checkpoint. The second one 
ensures it when C belongs to a predefined set of local checkpoints. 

The paper consists of 4 main sections. Section ^ introduces the database model we consider 
in this paper. Section y defines consistency of global checkpoints. Section Q answers the previous 
question. To provide such an answer, it studies the kind of dependencies both the transactions and 
their serialization order create among checkpoints of distinct data objects. More specifically, it is 
shown that, while some data checkpoint dependencies are causal, and consequently can be captured 
on the fly |^], some others are "hidden", in the sense that, they cannot be revealed by causality. 
It is the existence of those hidden dependencies that actually makes non-trivial the answer to the 
previous question. Then, Section H shows how the necessary and sufficient condition stated in 
Section |^, can be used to design "transaction-induced" data checkpointing protocols ensuring the 
property "Local checkpoint C belongs to a consistent global checkpoint". These protocols allow 
managers of data objects to take checkpoints independently on each otherQ, and use transactions 
as a means to diffuse information, among data managers, encoding dependencies on the previous 
states of data objects. When a transaction that accessed a data object is committed, the data 
manager of this object may be directed to take a checkpoint in order previously taken checkpoints 
belong to consistent global checkpoints. Such a checkpoint is called forced checkpoint. This is done 
by the data manager which exploits both its local control data and the information exchanged at 
the transaction commit point. 

Last but not least, this paper can be seen as a bridge between the area of distributed computing 
and the area of databases. For a long time, databases have provided distributed computing with 
very interesting problems and protocols related to data replication, concurrency control, etc. We 
show here how database checkpointing can benefit from studies that originated from distributed 
computing. Actually, a similar question has been addressed in the context of the asynchronous 

^These checkpoints are called basic. They can be taken, for example, during CPU idle time. 



process/message-passing model |[l|, ^]. In this context a message establishes a simple relation be- 
tween a pair of process local states. In the database context, a transaction may establish several 
relations between states of data objects. So, albeit there are some correspondences between the 
process/message-passing model and the data object/transaction modelQ, it appears that extending 
process/message-passing model results to the context of database transactions is not trivial as a 
transaction is "something" more complicated than a message: a transaction is on several data 
objects at a time, and accesses them by read and write operations whose results depend on the 
serialization order. 

2 Database Model 

We consider a classical distributed database model. The system consists of a finite set of data 
objects, a set of transactions and a concurrency control mechanism (see g, |6| for more details). 

Data objects. Each data object is managed by a data manager DM. A set of data objects can 
be managed by the same data manager DM. For clarity, we suppose that the set of data managed 
by the same DM constitutes a single logical data. So, there is a data manager DMx per data x. 

Transactions. A transaction is defined as a partial order on read and write operations on data 
objects and terminates with a commit or an abort operation. Ri{x) (resp. Wi{x)) denotes a read 
(resp. write) operation issued by transaction Tj on data object x. Each transaction is managed by 
an instance of the transaction manager (TM) that forwards its operations to the scheduler which 
runs a specific concurrency control protocol. The write set of a transaction is the set of all the data 
objects it wrote. 

Concurrency control. A concurrency control protocol schedules read and write operations is- 
sued by transactions in such a way that any execution of transactions is strict and serializable. This 
is not a restriction as concurrency control mechanisms used in practice (e.g., two-phase locking 2PL 
and timestamp ordering) generate schedules ensuring both properties |^]. The strictness property 
states that no data object may be read or written until the transaction that currently writes it 
either commits or aborts. So, a transaction actually writes a data object at its commit point. 
Hence, at some abstract level, which is the one considered by our checkpointing mechanisms, trans- 
actions execute atomically at their commit points. If a transaction is aborted, strictness ensures no 
cascading aborts and the possibility to use before images for implementing abort operations which 
restore the value of an object before the transaction access. For example, a 2PL mechanism, that 
requires that transactions keep their write locks until they commit (or abort), generates such a 
behavior p. 

Distributed database. We consider a distributed database as a finite set of sites, each site 
containing one or several (logical) data objects. So, each site contains one or more data managers, 
and possibly an instance of the TM. TMs and DMs exchange messages on a communication network 
which is asynchronous (message transmission times are unpredictable but finite) and reliable (each 
message will eventually be delivered). 



2 



At some abstraction level, there are similarities, o n on e side between processes and data objects, and on the other 



side, between messages and transactions (see Section 4.4) 



Execution. Let T = {Ti, T2, . . . , T„} be a set of transactions accessing a set D = {di, ^2, . . . , dm} 
of data objects (to simplify notations, data object di is identified by its index i). An execution E 
over T is a partial order on all read and write operations of the transactions belonging to T; this 
partial order respects the order defined in each transaction. Moreover, let <x be the partial order 
defined on all operations accessing a data object x, i.e., <x orders all pairs of conflicting operations 
(two operations are conflicting if they access the same object and one of them is a write). Given 
an execution E defined over T, T is structured as a partial order T = (T, <t) where <t is the 
following (classical) relation defined on T: 



T^ <T Tj 



(i ^ j) A {3x ^ {R,{x) <x Wj(x)) V (Wiix) <x Wj{x)) V {W^{x) <, Rj(x))) 



3 Consistent Global Checkpoints 



3.1 Local States and Their Relations 

Each write on a data object x issued by a transaction defines a new version of x. Let crj, denote 
the i-th version of x; al. is called a local state. Transactions establish dependencies between local 
states. This can be formalized in the following way. When T^ issues a write operation Wk{x), it 
moves the state of x from uj, to al^^. More precisely, cj, and aJ^^ are the local states of x, just 
before and just after the execution^ of T^, respectively. This can be expressed in the following way 
by extending the relation <r to include local states: 



Tk changes x from a^ to a. 



i+l 



{al <T n) A (T, <T al^^] 



Let <^ be the transitive closure of the extended relation <t- When we consider only local states, 
we get the following happened-before relation denoted <ls (which is similar to Lamport's happened- 
relation defined on process events 1^ in the process/message-passing model): 

Definition 3.1 (Precedence on local states, denoted <ls) 



crl <LS crl 
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a. Ti precedes T2 b. T2 precedes Ti 

Figure 1: Partial Order on Local States 



■^Remind that, as we consider strict and serializable executions, "Just before and just after the execution of Tk" 
means "Just before and just after Tk is committed". 



As the relation <t defined on transactions is a partial order, it is easy to see that the relation 
<LS defined on local states is also a partial order. Figure |l| shows examples of relation <ls- It 
considers three data objects x, y, and z, and two transactions Ti and T2. Transactions are defined 
in the following way: 

Ti : i?i(x); Wi{y); Wi{z); commiti 
T2 ■ R2{y); W2{x); commit2 

As there is a read-write conflict on x, two serialization orders are possible. Figure ||.a displays 
the case Ti <t T2 while Figure |l|.b displays the case T2 <t Ti. Each horizontal axis depicts the 
evolution of the state of a data object. For example, the second axis is devoted to the evolution of 
y: ay and ay are the states of y before and after Ti, respectively. 

Let us consider Figure |l|.a. It shows that Wi{y) and Wi{z) add four pairs of local states to the 
relation <ls, namely: 

4' <LS 4'^^ 

<^ <Ls ct:^+' 



<" <LS 'J. 



a 



iy + l 

y 



Precedence on local states, due to write operations of transactions Ti and T2, are indicated with 
continuous arrows, while the ones due to the serialization order are indicated in dashed arrows^. 
Figure |l|.b shows which precedences are changed when the serialization order is reversed. 

3.2 Consistent Global States 

A global state of the database is a set of local states, one from each data object. A global state 
G = {a'l , a^ , . . . , al^} is consistent if it does not contain two dependent local states, i.e., if: 

Vx, y E [1, . . . m] ^ -(^^ <L5 a^y) 

Let us consider again Figure ||.a. The three global states (o"Jf,(T/, cr*^), {ali^,ay ,a^^^^) and 
{al^^~^^,ay ,0"*"=+^) are consistent. The global state ((Tjf"'"^,(T/, cr*^"'"^) is not consistent either 
because ay <ls o'^T"''"^ (due to the fact Ti <t T2) or because ay <ls o'l"'^^ (due to the fact Ti 
writes both y and z). Intuitively, a non-consistent global state of the database is a global state that 
could not be seen by any omniscient observer of the database. It is possible to show that, as in the 
process/message-passing model, the set of all the consistent global states is a partial order Q. 

3.3 Consistent Global Checkpoints 

A local checkpoint (or equivalently a data checkpoint) of a data object x is a local state of x that 
as been saved in a safe place^ by the data manager of x. So, all the local checkpoints are local 
states, but only a subset of local states are defined as local checkpoints. Let C* denote the i-th 
local checkpoint of x; so, C* corresponds to some a^ with i < j. A global checkpoint is a set of local 
checkpoints one for each data object. It is consistent if it is a consistent global state. 

We assume that all initial local states are checkpointed. Moreover, we also assume that, when 
we consider any point of an execution E, each data object will eventually be checkpointed. 

^This shows dependencies between local states can be of two types. The ones that are due to each transaction 
taken individually, and the ones that are due to conflicting operations issued by distinct transactions (i.e., due to the 
serialization order). 

^For example, if x is stored on a disk, a copy is saved on another disk. 
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Figure 2: A Serialization Order 
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Figure 3: Data Checkpoint Dependencies 



4 Dependence on Data Checkpoints 

4.1 Introductory Example 

As indicated in the previous section, due to write operations of each transaction, or due to the 
seriaUzation order, transactions create dependencies among local states of data objects. Let us 
consider the following 7 transactions accessing data objects x, y, z and u: 



Wi{u); commiti 
W2{z); commit2 
W3{z); Wslx); commit^ 
Ra{u); Wi{z); commit^ 
W5{y); W^iz); commit^ 
WQ{y); commitQ 
Wt{x); commitj 



Figure ^ depicts the serialization imposed by the concurrency control mechanism. Figure ^ 
describes dependencies between local states generated by this execution. Five local states are 
defined as data checkpoints (they are indicated by dark rectangles). We study dependencies between 
those data checkpoints. Let us first consider C" and CJ. C" is the (checkpointed) state u before Ti 
wrote it, while CJ is the (checkpointed) state of y after Tg wrote it (i.e., just after Tg is committed). 
The serialization order (see Figure |2p shows that Ti <x Tq, and consequently C" <ls C^, i-e., the 
data checkpoint CJ is causally dependent Q on the data checkpoint C" (Figure ^ shows that there 
is a directed path of local states from C" to CJ). Now let us consider the pair of data checkpoints 
consisting of C° and C^. Figure y shows that C" precedes Ti, and that C^ follows T7. Figure 
Q indicates that Ti and Tj are not connected in the serialization graph. So, there is no causal 
dependence between C" and C^ (Figure y shows that there is no directed path from C" to C^). 
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But, as the reader can check, there is no consistent global checkpoint including both C° and C^ (^. 
So there is an hidden dependence between C^ and C^ which prevents them to belong to the same 
consistent global checkpoint. We now provide a definition of dependence that takes into account 
both causal dependencies and hidden dependencies. 

4.2 Dependence Path 

Definition 4.1 (Interval) A checkpoint interval /* is associated with data checkpoint C*. It con- 
sists of all the local states a^ such that: 

i<^x = CI) V {CI <LS CTx <LS C'^'^) 

As an example, Figure |3| shows that if includes 4 consecutive local states of z. Note that, due 
to the assumptions on data checkpoints stated in Section |3.3| , any local state belongs to exactly 
one interval. Let us call an edge of the partial order on local states a dependence edge. 

Definition 4.2 (Dependence Path^ 

■ DP 

There is a dependence path (DP) from a data checkpoint CI. to C-j^ (denoted C^ -^ C-j^) iff: 
i) X = y and i < j; or 
a) there is a sequence (di, d2, ■ ■ ■ ,dr) of dependence edges, such that: 

1) di starts after C*.; 

2) yq : 1 < q < r, dq: let I^ be the interval in which dq arrives; then dg+i starts in the same or 
in a later interval (i.e., an interval I^ such that k < h^; 

3) dn arrives before C-jj. 

4.3 Necessary and Sufficient Condition 

Theorem 4.1 Let Z C {1, . . . ,m} and S = {Cl^}x£i be a set of data checkpoints. Then S is a 
part of a consistent global checkpoint if and only if: 

{V) Vx,yGJ : ^(C^^^-) 

Proof 

Sufficiency. We prove that if (P) is satisfied then S can be included in a consistent global 

checkpoint. Let us consider the global checkpoint defined as follows: 

• if a; G X, we take C*? ; 



^Adding Cy and Cj to C" and C^ cannot produce a consistent global state as Cj <ls C^. Adding C^ ^ instead 
of Cf has the same effect as C" <ls C'f"'"^. 

^This definition generalizes the Z-path notion introduced in pi for asynchronous message-passing systems. A 
Z-path is a sequence of messages. While a message is a "concrete entity" , a dependence edge is an "abstract entity" . 



See Section i.4. So, as shown by the next theorem, the dependence edge abstraction allows to extend results of 
M, ha, M to data checkpoints. 

^ote that dq+i can start before dq arrives. This is where the dependence is "hidden". If \/q dq+i starts after dq 
arrives, then, the dependence path (di, d2, ■ ■ ■ ,dr) is purely causal. 



• if X X, for each y € X we consider the integer mx{y) = nim{i \ -■(C* — > C/)} (with 
"iT^xiy) = if iy = or if this set is empty). Then we take CI? with ix = iaia,Xy^x{mx{y)) ■ 
Let us note that, from that definition, it is possible that ix = (in that case, C*^ is an initial 
data checkpoint). 

By construction, this global checkpoint satisfies the two following properties : 

Vx^X, VyeX : ^(C^ ^ C^^) (1) 

Vx ^ X such that ix>0, 3z e I : {i, > 0) A (C^^"^ ^ CI') (2) 

We show that {CJS C^^ ^ • • • ? C^} is consistent. Assume the contrary. So, there exists x and y and 
a dependence edge d that starts after CI?" and arrives before C/. So, it follows that: 

(i,>0)A(C^^C7^^) (3) 

Four cases have to be considered: 

1. X G X, y G X. (3) is contradicted by assumption (V). 

2. xel,y^l. Since ij^ > 0, from (|) we have: 3z e I : (i^ > 0) A (C]""^ ^ C*-). 

DP i 

As, at data x both the dependence edge ending the path Cl^ —> Cy , and the dependence 
edge starting the path Cy —> C*^ belong to the same interval, we conclude from ^ that 

DP 

3z ^Z : {iz > 0) A [Cx^ —>■ CI") which contradicts the assumption (V). 

3. x^l,yel. dD contradicts (|). 

4. x^I,y^I. Since iy > 0, from (|) we have: 3z e I : (i^ > 0) A (C^""^ '^ C'/). 

As in case 2, we can conclude that 3z £l : (i^ > 0) A {CI? — > C*^) which contradicts (|l|). 

Necessity. We prove that, if there is a consistent global checkpoint {C^\ 6*2^, . . . , C^"} including 
iS, then property V holds for any X C {1, . . . , m}. Assume the contrary. So, there exist x G X and 

y G X such that (C*^ —i- Cy). From the definition of -^ , there exists a sequence of dependence 
edges di,d2, ■ ■ ■ ,dp such that: 

di starts in I*^ , 
(ii arrives after Il\ , (i2 starts in Il\ with ii < ji 

dp-i arrives in fx~^ii dp starts in Ixpli with jp^i < ip^i 
dp arrives in Iy 
We show by induction on p that, \/t > iy, C'!j? and C* cannot belong to the same consistent 
global checkpoint. 

Base step, p = 1. In this case, di starts after CI? and arrives before Cy" , and consequently 

the pair {C^^ ,Cy) cannot belong to a consistent global checkpoint. 

Induction step. We suppose the result true for some p > 1 and show that it holds for p + 1. We 

have: 



di starts in 11^, 

dp arrives in ll?^, dp+i starts in I^^ with ip < jp 

dp+i arrives in ly 
From the assumption induction apphed to the path of dependence edges di, . . . , dp, we have : 
for any t>ip + l, C^^ and C* cannot belong to the same consistent global checkpoint. Moreover, 

dp+i starts in /^^ and arrives in fy imply that, for any h < jp and for any t > iy, C^ and C* 
cannot belong to the same consistent checkpoint. Since ip < jp, it follows that no checkpoint of Xp 
can be included with C^? and C/ to form a consistent global checkpoint. □ 



4.4 Database Systems vs Message-Passing Systems 

Messages vs transactions. An analogous result for message-passing systems has been designed 
in 1^] and generalized in |l|. As indicated in Section ^^, point-to-point message-passing systems are 
characterized by the fact each message generates exactly one dependence edge between two process 
local checkpoints. In database systems, a dependence edge is due either to a write operation or to 
the serialization order. As a transaction may issue several write operations and is serialized in some 
order by the concurrency control mechanism, it follows that it may generate a lot of dependence 
edges between data checkpoints. For example, when a transaction writes a data objects, these writes 
establish a^ dependence edges and supplementary edges are added according to the serialization 
order. 

Consistency of a recovery line. Let us call Recovery Lin^ a line joining all the data check- 
points of a global checkpoint. A recovery line is consistent iff the associated global checkpoint is 
consistent. Let us remind black and dashed arrows introduced in the example of Section |3.1| : a 
black arrow denotes a local checkpoints precedence created by a transaction, while a dashed arrow 
denotes a local checkpoints precedence created by the serialization order. When considering such 
black and dashed arrows (see Figure p, it is possible to show that a recovery line C is consistent 
iff: 

• No black arrow crosses C. 

• No dashed arrow crosses C from the right of £ to the left of C. 

In a message-passing system, a recovery line (cut) is consistent iff no message crosses it from 
the right to the left j^. Messages crossing the recovery line from left to right are "in-transit" 
with respect to the recovery line. This intuitively shows that, in a message-passing system: (1) 
a message corresponds to a "dashed arrow", and (2) there is no "black arrow". So, it appears 
that consistency of global checkpoints is a problem more involved in database systems than in 
message-passing systems. 

5 Deriving "Transaction-Induced" Checkpointing Protocols 

Required Properties. If we suppose that the set S includes only a checkpoint C^^, the previous 
Theorem leads to an interesting corollary C: 



®Also called cut, when adopting the distributed computing terminology. 



DP 

Corollary 5.1 C^^ belongs to a consistent global checkpoint if and only if ^{C^^ —>■ Clf). 

Providing checkpointing protocols ensuring property C is interesting for two reasons: 

- (1) It avoids to waste time in taking a data checkpoint that will never be used in any consistent 
global checkpoint, and 

- (2) No domino-effect can ever take place as any data checkpoint belongs to a consistent global 
checkpointj^j. 



Moreover, let us consider the following property V: "If it exists, the set Sn formed by the 
data checkpoints with the same index n > (one from each data object), is a consistent global 
checkpoint". In the following we provide two checkpointing protocols: 

• The first protocol (A) guarantees C for all local checkpoints, and guarantees V for any value 
of n. 

• The second protocol (B) ensures C only for a subset of local checkpoints, and V for some 
particular values of n. 

Actually, those protocols can be seen as adaptations to the data-object/transactions model, of 
protocols developed for the process/message-passing model. More precisely, protocol A corresponds 
with Briatico et aZ.'s protocol 0], while protocol B corresponds with Wang-Fuchs's protocol [H]. 

Local Control Variables. In both protocols we assume each data manager DM^ has an index 
ix, which indicates the index (rank) of the last checkpoint of x (it is initialized to zero). Moreover, 
each data manager can take checkpoint independently {basic checkpoints), for example, by using a 
periodic algorithm which could be implemented by associating a timer with each data manager. A 
local timer is set whenever a checkpoint is taken. When a local timer expires, a basic checkpoint 
is taken by the data manager. Data managers are directed to take additional data checkpoints 
{forced checkpoints) in order to ensure C or V. The decision to take forced checkpoints is based on 
the control information piggybacked by transactions. 

A protocol consists of two interacting parts. The first part, shared by both algorithms, specifies 
the checkpointing-related actions of transaction managers. The second part defines the rules data 
managers have to follow to take data checkpoints. 

Protocols A and B: Behavior of a Transaction Manager. Let Wxi be the write set of a 
transaction Tj managed by a transaction manager TMi. We assume each time an operation of Tj 
is issued by TMi to a data manager DM^, it returns the value of x plus its index i^. TMi stores 
in My. the maximum value among the indices of the data objects read or written by Tj. When 
transaction Tj is committed, the transaction manager TMi sends a COMMIT message to each data 
manager DM^ involved in Wr^- Such commit messages piggyback My. . 



^'-'When, after a crash, a data manager recovers, it can restore its last data checkpoint C. It follows from C that 
C belongs to a consistent global checkpoint. So the database can be restarted as soon as each data manager has 
restored its data checkpoint contained in a consistent global checkpoint including C. Note that, when compared to 
message-passing systems, no "channel state" has to be restored. 
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Protocol A: Behavior of a Data Manager. As far as checkpointing is concerned, the behavior 
of a data manager DM^ is defined by the two following procedures namely take-basic-ckpt and 
take-forced-ckpt. They defined the rules associated with checkpointing. 

tcike-basic-ckpt (^) : 

When the timer expires: 
(ABl) i^^i^ + 1; 
(AB2) Take checkpoint C^-; 
(AB3) Reset the local timer. 

take-f orced-ckpt (^) : 

When DMx receives COMMIt(MtJ from TMi: 
if ix < Mt^ then 

(Al)i^^Mr,; 

(A2) Take a (forced) checkpoint C^^; 

(A3) Reset the local timer. 
endif; 
(A4) process the commit message. 

Prom the increase of the index i^ of a data object x, and from the rule take-forced-ckpt (A) 

DP 
(which forces a data checkpoint whenever ix < MtJ, the condition ^{C^x ~^ ^x") follows for any 

DP i 

data checkpoint. Actually, this simple protocol ensures that, if Ct^ -^ Cy" , then the index ix 
associated with C^^ is strictly lesser than the index iy associated with C/. 

It follows from the previous observation that if two data checkpoints have the same index, 

DP 

then they cannot be related by -^ . So, all the sets 5„ that exist are consistent. Note that 
the take-forced-ckpt (^) rule may produce gaps in the sequence of indices assigned to data 
checkpoints of a data object x. So, from a practical point of view, the following remark is interesting: 
when no data checkpoint of a data object x is indexed by a given value n, then the first data 
checkpoint of x whose index is greater than n, can be included in a set containing data checkpoints 
indexed by n, to form a consistent global checkpoint. 

Protocol B: Behavior of a Data Manager. This protocol introduces a system parameter 
Z > 1 known by all the data managers [|l4|. Only for subset of data checkpoints whose index is 

equal to a x Z (where a > is an integer), we have: -'(C°^ — > C"^). Moreover, when, Vx, C^^ 
exists, then the global checkpoint Saz exists and is consistent. 

The rule take-basic-ckpt (;B) is the same to the one of the protocol A. In addition to the 
previous control variables, each data manager DM^ has an additional variable V^, which is incre- 
mented by Z each time a data checkpoint indexed aZ is taken. The rule take-forced-ckpt (0) is 
the following. 

take-forced-ckpt (S) : 

When DMx receives COMMlT(Mr-) from TMi: 
if Vx < MTi then 

(Bl) \x ^ [MtJZ\ X Z; 

(B2) Take a (forced) checkpoint C^^; 

(B3) Reset the local timer; 

(B4) Vx ^Vx + Z. 
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endif; 

(B5) Process the COMMIT message. 

About coordination. Compared to previous checkpointing protocols appeared in the Utera- 
ture ||l^, |l^, which use an explicit coordination among data managers to get consistent global 
checkpoints, the proposed protocols provide the same result by using a lazy coordination which is 
propagated among data managers by transactions (with COMMIT messages). In particular, protocol 
A starts a "transaction-induced" coordination each time a basic checkpoint is taken; while protocol 
B starts a coordination each time a basic checkpoint, whose index is a multiple of the parameter 
Z, is taken. The latter protocol seems to be particularly interesting for database systems as it 
shows a tradeoff, mastered by a system parameter Z, between the number of forced checkpoints 
and the extent of rollback during a recovery phase. The greater Z is, the larger will be the rollback 
distance. 

6 Conclusion 

This paper has presented a formal approach for consistent data checkpoints in database systems. 
Given an arbitrary set of data checkpoints (including at least a single data checkpoint from a data 
manager, and at most a data checkpoint from each data manager), we answered the following 
important question "Can these data checkpoints be members of a same consistent global check- 
point?" by providing a necessary and sufficient condition. We have also derived two non-intrusive 
data checkpointing protocols from this condition; these checkpointing protocols use transactions as 
a means to diffuse information among data managers. 

This paper can also be seen as a bridge between the area of distributed computing and the area 
of databases. We have shown that the checkpointing problem is harder in data-object/transaction 
systems than in process/message-passing systems. From a distributed computing point of view, we 
could say that database systems are difficult because they merge the "synchronous world" (every 
transaction taken individually has to be perceived as atomic: it can be seen as a multi-rendezvous 
among the objects it is on) and the "asynchronous world" (due to relations among transactions 
managed by the concurrency control mechanism). 
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